Bootstrapping statistical parsers from small datasets

نویسندگان

  • Mark Steedman
  • Anoop Sarkar
  • Miles Osborne
  • Rebecca Hwa
  • Stephen Clark
  • Julia Hockenmaier
  • Paul Ruhlen
  • Steven Baker
  • Jeremiah Crim
چکیده

We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of bootstrapping parsers when the manually parsed training material is in a different domain to either the raw sentences or the testing material. We show that bootstrapping continues to be useful, even though no manually produced parses from the target domain are used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Example Selection for Bootstrapping Statistical Parsers

This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-training, in which two parsers are iteratively re-trained on each other’s output; and a semi-supervised approach, corrected co-training, in which a human corrects each parser’s output before adding it to the training data...

متن کامل

Bootstrapping a neural net dependency parser for German using CLARIN resources

Statistical dependency parsers have quickly gained popularity in the last decade by providing a good trade-off between parsing accuracy and parsing speed. Such parsers usually rely on handcrafted symbolic features and linear discriminative classifiers to make attachment choices. Recent work replaces these with dense word embeddings and neural nets with great success for parsing English and Chin...

متن کامل

Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets

Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a different domain, using only small amounts of manually annotated seed data. We report significant impr...

متن کامل

Bootstrapping Feature-Rich Dependency Parsers with Entropic Priors

One may need to build a statistical parser for a new language, using only a very small labeled treebank together with raw text. We argue that bootstrapping a parser is most promising when the model uses a rich set of redundant features, as in recent models for scoring dependency parses (McDonald et al., 2005). Drawing on Abney’s (2004) analysis of the Yarowsky algorithm, we perform bootstrappin...

متن کامل

Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach

In this paper, we propose using a ”bootstrapping” method for constructing a dependency treebank of Arabic tweets. This method uses a rule-based parser to create a small treebank of one thousand Arabic tweets and a data-driven parser to create a larger treebank by using the small treebank as a seed training set. We are able to create a dependency treebank from unlabelled tweets without any manua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003